137 research outputs found

    Analysis of I-Vector framework for Speaker Identification in TV-shows

    Get PDF
    International audienceInspired from the Joint Factor Analysis, the I-vector-based analysis has become the most popular and state-of-the-art framework for the speaker verification task. Mainly applied within the NIST/SRE evaluation campaigns, many studies have been proposed to improve more and more performance of speaker verification systems. Nevertheless, while the i-vector framework has been used in other speech processing fields like language recognition, a very few studies have been reported for the speaker identification task on TV shows. This work was done in the REPERE challenge context, focused on the people recognition task in multimodal conditions (audio, video, text) from TV show corpora. Moreover, the challenge participants are invited for providing systems for monomodal tasks, like speaker identification. The application of the i-vector framework is investi-gatedthrough different points of views: (1) some of the i-vector based approaches are compared, (2) a specific i-vector extraction protocol is proposed in order to deal with widely varying amounts of training data among speaker population, (3) the joint use of both speaker diarization and identification is finally analyzed. Based on a 533 speaker dictionary, this joint system wins the monomodal speaker identification task of the 2014 REPERE challenge

    The LIA RT’07 speaker diarization system

    Get PDF
    Abstract. This paper presents the LIA submission to the speaker diarization task of the 2007 NIST Rich Transcription (RT'07) evaluation campaign. We report a system optimised for conference meeting recordings and experiments on all three RT'07 subdomains and microphone conditions. Results show that, despite state-of-the-art performance for the single distant microphone (SDM) condition, in its current form the system is not effective in utilising the additional information that is available with the multiple distant microphone (MDM) condition. With post evaluation tuning we achieve a DER of 19% on the MDM task with conference meeting data. Some early experimental work highlights both the limitations and potential of utilising between-channel delay features for diarization

    Towards a complete Binary Key System for the Speaker Diarization Task

    Get PDF
    International audienceSpeaker diarization is the task of partitioning an audio stream into homogeneous segments according to speaker identity. Today state-of-the-art speaker diarization systems have achieved very competitive performance. However, any small improvement in Diarization Error Rate (DER) is usually subject to very large processing times (real time factor above one), which makes systems not suitable for some time-critical, real-life applications. Recently, a novel fast speaker diarization technique based on speaker modeling using binary keys was presented. The proposed technique speeds up the process up to ten times faster than real-time with little increase of DER. Although the approach shows great potential, the presented results are still preliminary. The goal of this paper is to further investigate this technique, in order to move towards a complete binary-key based system for the speaker diarization task. Preliminary experiments in Speech Activity Detection (SAD) based on binary keys show the feasibility of the binary key modeling approach for this task. Furthermore, the system has been tested on two different kinds of test data: meeting audio recordings and TV shows. The experiments carried out on NIST RT05 and REPERE databases show promising results and indicate that there is still room for further improvement

    Characterization of the Pathological Voices (Dysphonia) in the frequency space

    No full text
    International audienceThis paper is related to the dysphonic voice assessment. It aims at studying the characteristic of dysphonia on the frequency domain. In this context, a GMM based automatic classication system is coupled to a frequency subband architecture in order to investigate which frequency bands are relevant for dysphonia characterization. Through various experiments, the low frequencies [0- 3000] Hz tend to be more interesting for dysphonia discrimination compared with higher frequencies

    Evaluation d'un alignement automatique sur la parole dysarthrique

    Get PDF
    International audiencePhonetico-acoustic analysis of pathological speech requires a reliable phonetic alignment. Since manual labeling is highly time-consuming, automatic alignment may be necessary for analyzing large databases. This paper evaluates the reliability of automatic alignment for dysarthric speech. Results on read speech samples of 4 dysarthric speakers compared to 2 normophonic speakers show that alignment performance depends on the severity of dysarthria. Specific patterns for different phonetic classes and directions for filtering reliable parts are discussed.L'analyse acoustico-phonétique de la parole pathologique demande un alignement phonétique du signal de parole. L'alignement manuel étant très coûteux en temps, il semble nécessaire de fournir des alignements automatique. Cette étude évalue la fiabilité d'un alignement automatique pour la parole dysarthrique. Le résultats sur des échantillons de parole lue (4 locuteurs dysarthriques et 2 locuteurs contrôles) montrent que la performance de l'aligneur dépend de la sévérité de la dysarthrie. La spécificité des classes phonétiques est par ailleurs discutées dans le papier

    Analyse Phonétique dans le Domaine Fréquentiel pour la Classification des Voix Dysphoniques

    No full text
    International audienceConcerned with pathological voice assessment, this paper aims at characterizing dysphonia in the frequency domain for a better understanding of related phenomena while most of the studies have focused only on improving classification systems for diagnosis help purposes. Based on a first study which demonstrates that the low frequencies ([0-3000]Hz) are more relevant for dysphonia discrimination compared with higher frequencies, the authors propose in this paper to pursue by analyzing the impact of the restricted frequency subband ([0-3000]Hz) on the dysphonic voice discrimination from a phonetical point of view. In this sense, performance of the GMM-based automatic dysphonic voice classification system is measured according to different phoneme classes and frequency bands ([0-3000] and [0-8000]Hz).Ce travail vise à caractériser la dysphonie dans le domaine fréquentiel pour une meilleure compréhension des phénomènes de dysfonctionnement. Fondée sur une expérience qui démontre que les basses fréquences ([0-3000] hertz) sont plus appropriées pour la discrimination des dysphonies, les auteurs proposent dans ce document de poursuivre en analysant l'impact de la sous-bande restreinte de fréquence ([0-3000] hertz) sur la discrimination de voix en fonction des segments phonétiques. Dans ce sens, la performance d'un système fondé sur les GMM pour la classification automatique du grade de dysphonie est mesurée selon différentes classes de phonème et des bandes de fréquence ([0-3000] et [0-8000] hertz)

    Comparaison d'analyses phonétiques de parole dysarthrique basées sur un alignement manuel et un alignement automatique

    No full text
    International audienceThe reliability of an automatic speech alignment procedure for the phonetic description of dysarthric speech is assessed through the comparison of durational and spectral measurements obtained from an automatic and a manual alignment of the production of 4 dysarthric speakers varying in severity. Results show that formant values computed in the middle of the vowel intervals and center of gravity of fricative noise computed over the consonant intervals, are reliable when based on automatic alignments. However, the analysis of pause occurrences and absolute segmental duration require manual corrections of the automatic outputs.Les performances d'une procédure d'alignement automatique permettant la description de la parole dysarthrique est évaluée dans cette étude. Les durées et valeurs spectrales extraites des enregistrements de 4 patients dysarthriques (différents niveaux de sévérité) obtenues à partir de l'alignement automatique sont comparées à celles obtenues partir d'un alignement manuel. Les résultats sur l'analyse spectrale des voyelles et des fricatives montrent que l'alignement automatique est performant. Toutefois, l'analyse des pauses et les mesures de durées manquent de précision et suggèrent qu'une correction manuelle de l'alignement automatique est nécessaire

    Acoustic-phonetic decoding for speech intelligibility evaluation in the context of Head and Neck Cancers

    Get PDF
    International audienceIn addition to health problems, Head and Neck Cancers (HNC) can cause serious speech disorders that can lead to partial or complete loss of speech intel-ligibility in some patients. The clinician's evaluation of the intelligibility level before or after surgical treatment and / or during the rehabilitation phase is an important part of the clinical assessment. Perceptive assessment is the most widely used method in clinical practice to assess the level of intelligibility of a patient despite the limitations associated with it such as subjectivity and moderate reproducibility. In this paper, we propose to overcome these limitations by associating a specific task of speech production based on pseudo-words with an automatic speech processing system, both oriented towards acoustic-phonetic decoding. Compared to human perception, the automatic system reaches very high correlation rates and promising results when applied to a French speech corpus including 41 healthy speakers and 85 patients suffering from HNC

    Modélisation statistique et infomations pertinentes pour la caractérisation des voix pathologiques (dysphonies)

    No full text
    International audienceCet article porte sur l'importance du type d'information appropriée pour une tâche de classification automatique de voix produite par des patients atteints de dysfonctionnement vocal. En employant un système de classification GMM (dérivé de la reconnaissance automatique du locuteur), le focus a été mis sur trois classes principales d'information : une information portant sur l'énergie, une deuxième sur les parties voisées, et une troisième en fonction des segments phonétiques. Les expériences, qui ont porté sur un corpus de dysphoniques, ont montré que cette information phonétique est particulièrement intéressante dans ce contexte puisqu'elle permet d'analyser le résultat en fonction du phonème ou de la classe de phonème

    Application of Automatic Speaker Recognition techniques to pathological voice assessment (dysphonia)

    No full text
    International audienceThis paper investigates the adaptation of Automatic Speaker Recognition (ASR) techniques to the pathological voice assessment (dysphonic voices). The aim of this study is to provide a novel method, suitable for keeping track of the evolution of the patient's pathology: easy-to-use, fast, non-invasive for the patient, and affordable for the clinicians. This method will be complementary to the existing ones - the perceptual judgment and the usual objective measurement (jitter, airflows...) which remain time and human resource consuming. The system designed for this particular task relies on the GMMbased approach, which is the state-of-the-art for speaker recognition. It is derived from the open source ASR tools (LIA_Spk- Det and ALIZE) of the LIA lab.Experiments conducted on a dysphonic corpus provide promising results, underlining the interest of such an approach and opening further research investigation
    • …
    corecore